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Abstract 

Key insights in molecular biology, such as enzyme kinetics f25l, protein 
allostery |l26l[20l and gene regulation [IJ, emerged from quantitative analy- 
sis based on time-scale separation, allowing internal complexity to be elimi- 
nated and resulting in the well-known formulas of Michaelis-Menten, Monod- 
Wyman-Changeux and Ackers-Johnson-Shea. In systems biology, steady- 
state analysis has yielded eliminations that reveal emergent properties of multi- 
component networks | 331|38l[37 l- Here we show that these analyses of nonlin- 
ear biochemical systems are consequences of the same linear framework, con- 
sisting of a labelled, directed graph on which a Laplacian dynamics is defined, 
whose steady states can be algorithmically calculated. Analyses previously 
considered distinct are revealed as identical, while new methods of analysis 
become feasible. 
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The linear framework 



Biological systems may sometimes be in steady state, as when synthesis or growth is 
balanced by degradation or loss. Typically, this holds only for a limited time period. 
It may also sometimes be reasonable to assume an explicit separation of time scales, 
in which a sub-system is operating fast compared to the rest of the system. The fast 
components may then be treated as if they are at steady state relative to the slow 
components. In either context, steady-state analysis is required. The framework 
introduced here provides a systematic way to calculate steady states, with broad 
applicability to biochemical systems. 

We start from a graph, G, consisting of vertices, 1, • • • , n, with labelled, directed 
edges i j and no self loops, i -/^ i (Figure [Tj\). The vertices represent compo- 
nents of a system, on which a dynamics is defined by treating each edge as if it 
were a first-order chemical reaction under mass-action kinetics, with the label as 
rate constant. This gives a system of linear, ordinary differential equations (ODEs), 

where x is a column vector of component concentrations and C{G) is the Lapla- 
cian matrix of G. Such matrices were introduced by Kirchhoff [19] and resemble 
discretisations of the Laplacian operator (see the Appendix). 

Since material is neither created nor lost, the total concentration, Xtot = xi + 

\- Xn, remains constant at all times, so that 1^ .C{G) = 0, where 1 is the all-ones 

column vector and ^ denotes transpose. 

Nonlinearity can be encoded either in the vertices or, more commonly, in the 
labels. Labels are real numbers, a G M, which may be algebraic expressions over 
a set of symbols, {/xi, • • • , /x^}. Symbols may be rate constants, k, or concentra- 
tions, [X], of chemical species X. For instance, X may be a slow component in a 
time-scale separation. All calculations are in terms of symbols, whose numerical 
values do not have to be known in advance, thereby avoiding problems of parameter 
estimation. Labels must have dimensions of (time)"^ and be positive, a G M>o. 

A crucial restriction is that if a concentration symbol, [X] , appears in a label in 
G, then X must be an external species and not correspond to a vertex in G. This 
"uncoupling condition" is essential to preserve linearity and is the key requirement 
for applications of the framework. 

The Laplacian, jC{G), is a n x n matrix over R. The interest lies in the steady 
states of ([T]), for which dx/dt = 0, or, equivalently, x is in the kernel of the Lapla- 
cian, X G ker C{G). The kernel can be determined in two steps, first for a strongly 
connected graph and then for any graph. 

A strongly connected graph is one in which any two distinct vertices can be 
joined by a series of edges in the same direction. While this depends only on the 
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edge structure and not on the labels, the sign of a label determines the direction of 
flux. For strongly-connected graphs with positive labels, the dimension of ker C{G) 
is one, [37J. In this case, Tutte's Matrix-Tree Theorem (MTT) describes a basis 
element, p G ker£(G), [40J. To calculate p^, take the product of all the labels 
on a spanning tree of G rooted at vertex i and add the products over all such trees 
(Figure [T^, box). A spanning tree is a fundamental concept in graph theory; it is a 
subgraph of G that contains each vertex of G (spanning) which has no cycles when 
edge directions are ignored (tree); it is rooted at i if i is the only vertex with no 
outgoing edges in the tree. Spanning-tree calculations are shown in Figure [T^ and 
Figures 1 and 2 of the Appendix. 

The kernel could have been calculated using determinants. The significance of 
the MTT is that it expresses pi as a polynomial in the labels with positive coefficients 
(Figure [T^). This resolves the alternating signs that arise with determinants and 
ensures that steady-state concentrations remain positive, so long as the labels are 
positive. Being able to algorithmically calculate steady states in terms of labels 
is the essence of the framework. The MTT has been frequently rediscovered in 
biology in various guises, llT8l[T2l . 

If X is any steady-state, then, since dimker£(G) = 1, we know that x = Xp, 
where A G M. The undetermined A reflects the amount of matter in the system. It 
can be removed by normalising in different ways: 

1. Xi = i — ]xi 2. Xi = { — ] xtot . (2) 

\PiJ \PtotJ 

In 1 , one of the vertices, by convention vertex 1, is chosen as a reference. In 2, XtQt 
plays a similar role, with ptot = Pi + ' ' ' + Pn- 

Equation ([2]) shows that the n components in the system can be eliminated in 
favour of rational expressions, Pi/pi or Pi/ptot^ the labels of which may involve the 
concentrations of other components. This dramatic simplification is a consequence 
of strong connectivity and is central to the time-scale separation applications dis- 
cussed below. 

If G is an arbitrary graph, it can be decomposed into strongly connected com- 
ponents (SCCs), which inherit from G a directed graph structure, G, that has no 
directed cycles (Figure [T]C). Since there is no net flux of material into the initial 
SCCs in G, it can be shown that only the terminal SCCs contribute to any steady 
state (Appendix). For each terminal SCC, t, let p^ G be the vector which, for 
vertices in that SCC, agrees with the values coming from the MTT applied to that 
SCC in isolation, while for any other vertex, j t, = 0. These vectors form a 
basis for the kernel of the Laplacian: 

ker/:(G) = (p\...,p^), (3) 



3 



where T is the number of terminal SCCs. By construction, if i is any vertex, 

{p^)i 7^ if, and only if, i G t (4) 

A description of k.ei C{G) appears in the Appendix of [9J. The construction given 
here goes further in using the MTT to give explicit expressions for the basis ele- 
ments in terms of the labels. 

Four applications are discussed next. The first stands apart from the rest in not 
being a time-scale separation. It illustrates the wide scope of the framework. The re- 
maining applications show how the MTT systematises the eliminations arising from 
time-scale separation. In each case the framework integrates classical and modern 
analyses of biochemical systems. The intention is not to reveal new results in each 
area but to show that, rather than being different calculations, they are all the same 
calculation, made manifest in the labelled, directed graphs that appear in Figures [2] 
to[5j Following these applications, an extension to the framework is introduced that 
allows for synthesis and degradation of components (Figure [6]). Some specialised 
results for thermodynamic equilibrium are outlined in the Appendix. 



Chemical Reaction Network Theory 

For a reversible chemical reaction between species Si, - • • ,Sk and species Pi , • • • , P/, 

aiSi + • • • + akSk 5 /3iPi + • • • + A^/ 
k- 

mass-action kinetics implies a Haldane relationship f6l at equilibrium, 

[^i]"^ • • • [SkY^ ^ ^ ... 

[P^Y^ ■ ■ ■ [P,]A k-- ^ ^ 

Formula ([5]) may also be deduced from thermodynamics and, here, kinetics is con- 
sistent with thermodynamics. However, a network of reactions may have kinetic 
equilibria that do not satisfy thermodynamic constraints |[22l . The condition of "de- 
tailed balance" was introduced to avoid such paradoxes [[22l |23j]. This plays an 
important role at equilibrium, as explained in the next section. 

In a seminal paper [14J, Horn and Jackson, sought to extend thermodynamic 
properties like ([5]) to steady states far from equilibrium. Under mass-action kinet- 
ics, any reaction network gives rise to a system of nonlinear ODEs, dc/dt = /(c). 
To disentangle the nonlinearity, the expressions that appear on either side of a re- 
action were treated as new entities called "complexes", so that a chemical reaction 
network, A^, with m species, gave rise to a labelled, directed graph, Gn, on n 
complexes (Figure [2]). The nonlinear function / on species is replaced by the linear 
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Laplacian, C{Gn), on complexes, with the labels being just the rate constants of the 
corresponding reactions. Here, the nonlinearity is entirely encoded in the vertices. 
(Horn and Jackson defined the function on complexes without being aware of its in- 
terpretation as a graph Laplacian.) The two functions, one acting on species and the 
other on complexes, are linked by a linear function y : ^ and a nonlinear 
function ^1 : R^ R^ (Figure [2} caption). These encode the stoichiometry of the 
species in the complexes in such a way that that the diagram in Figure [2] commutes, 
/(c) = YC{Gn)^{c), Only ^1 is nonlinear, revealing a substantial linearity within 
the dynamics, arising from the graph-theoretic structure. This decomposition is the 
starting point of CRNT, MM^- 

Formula ^ applies to C{Gn) and plays a fundamental role. If c is positive, 
c G (R>o)^, then, by definition, so is ^^(c) G (R>o)^. Hence, a positive steady 
state, with /(c) = 0, can arise in only one of two ways: either C{Gn)^{c) = or, 
if not, then Y C{Gn)^{c) = 0. In the first case, c is said to be "complex balanced". 
It then follows from ([3]) that 

t=i 

where G R. Let u and v be two complexes in the same terminal SCC of Gat, say 
t = t*. Suppose that the multiplicity of species iinuh ui and in ^ is Using (|4]) 
and the definition of (Figure [2} caption). 

The term on the right depends only on the rate constants and this "quasi-thermostatic" 
property lfT4ll generalises the Haldane relationship in (|5]). With the MTT, the gener- 
alised "equilibrium constants" can now be explictly calculated in terms of the rate 
constants. 

Horn and Jackson showed further that complex balancing satisifies other prop- 
erties expected of thermodynamic equilibria, justifying it as a non-equilibrium gen- 
eralisation of detailed balancing lfT4l . 

Formula ([3]) has also provided modern insights. For instance, (|6]) shows that 
complex-balanced steady states are generated by polynomials with only two terms 
(binomials), 

(/).(cr---c)-(/)„(c----c) = o 

and therefore form a toric algebraic variety [[l0l[71|, similar to those arising from log- 
linear models in algebraic statistics [[30||. This, and other recent results, [24J, have 
introduced methods of algebraic geometry to the analysis of molecular reaction 
networks. 

Formula ([3]) remains useful even without a complex-balanced steady state. In the 
simplest case, ker YC{Gn) contains only one additional basis element compared to 
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ker C{Gn)' If c is a positive steady state, then 

T 

t=i 

where x is the additional basis element and A G M. Because x be non-zero at 
any complex, the Haldane- style formulas in ([6]) can no longer be deduced, except 
in the case where u and v are not in any terminal SCC. If u and v also differ only 
in a single species k, so that Ui = Vi for i k, then Ck depends only on the rate 
constants 

f Xu 
Ck= [ — 

and exhibits "absolute concentration robustness". This is the Shinar-Feinberg The- 
orem [l34), which has particular applications to bifunctional enzymes, where the 
robustness is suppored by experimental evidence [|2l l35l[36l . 



Reversible ligand binding 

Reversible binding of ligands to a substrate is a feature of many cellular processes, 
such as gene regulation, [IJ, and protein allostery, [26]. The linear framework may 
be readily applied by assuming that the time-scale of binding is well- separated be- 
tween faster upstream interactions, such as ligand dimerisation, and slower down- 
stream processes that react to the binding, such as gene expression (Figure |3]). 

Consider a substrate that may exist in multiple states. These may, for instance, 
be states of DNA looping or nucleosome organisation at a promoter or conforma- 
tional states in an allosteric protein. Ligands may bind reversibly to the substrate 
with potentially overlapping site preferences, cooperativity and dependence on sub- 
strate state. A labelled, directed graph can be constructed as follows (Figure [3]). 
The vertices correspond to microstates, consisting of the patterns of ligand bind- 
ing in each substrate state. The edges correspond to transitions between substrate 
states, with ligand binding unaltered, or to binding or unbinding of the ligands, with 
substrate state unaltered. Of these edges, ligand binding has a label of the form 
k[L], where A; is a rate constant and [L] is a concentration, taken either at steady 
state or as slowly varying when L is a slow variable; all other edges have only a 
rate constant as label. Provided the substrate is not a ligand for itself, so that the 
uncoupling condition is satisfied, and the graph is strongly connected, as is the case 
in most applications, the MTT allows the microstates to be eliminated in favour of 
the ligands. Most quantities of biological interest can be calculated in terms of the 
resulting expressions (Figure [3] and the Appendix). 

An important special case is when the system can reach thermodynamic equilib- 
rium (Appendix). In this case, detailed balance (DB) provides a simpler alternative 
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to the MTT. According to DB, which follows from the fundamental reversibility of 
microscopic dynamics at equilibrium [23 J, each edge is reversible and any pair of 
reversible edges, i j and j i, is independently at kinetic equilibrium. Hence, 
given any steady state x, Xj = {a/b)xi, irrespective of any other edges that impinge 
on i or j. Since each edge is reversible, the graph is strongly connected. Starting 
from a reference microstate, 1, and taking a path of reversible edges to j, we find 
that Xj = ajXi. Just as in (|2]), each xj can be eliminated in favour of rational ex- 
pressions in the labels. At equilibrium, DB cuts down the rooted trees of the MTT 
to a single path from 1. 

There may be many such paths. However, the rate constants are not free to 
vary arbitrarily. DB requires that they yield the same aj no matter what path is 
taken from 1 to j. These constraints may be summarised in the "cycle condition": 
for any cycle of reversible edges, the product of the rate constants on clockwise 
edges equals the product on counterclockwise edges (Figure [3}\). This condition is 
necessary and sufficient for aj to be independent of the path taken and for every 
equilibrium state to satisfy DB (Appendix). 

Equilibrium ligand binding has usually been analysed by statistical mechanical 
methods, [|l3l|42]|, as in protein allostery, [[261 [28l, and gene regulation, lfTl[33l[3]|. 
The linear framework gives identical results from a more kinetic perspective. Its 
main advantge is that it also applies away from equilibrium. For instance, in the 
yeast phosphate control system, nucleosome organisation at the PH05 promoter in- 
fluences its gene regulation function (GRF) in response to the transcription factor 
Pho4, lfT6l . Nucleation and disassembly of nucleosomes is a dissipative process. 
However, the GRF may still be calculated from the appropriate graph — Figure 4B 
in [16J — using the MTT. The linear framework is well suited to the modern pro- 
gramme of unravelling complex GRFs, |[3l [TTll 

Enzyme kinetics 

The fundamental basis of enzymology is that enzymes act through intermediate 
enzyme-substrate complexes, [[25l IH, (Figure |4]). Under in-vitro conditions, in 
which substrate is in excess, a time-scale separation may be assumed, with the in- 
termediate complexes quickly reaching steady state, while conversion of substrate 
to product takes place more slowly. This is the quasi-steady state approximation, a 
version of which goes back to Michaelis and Menten, [[25101. A labelled, directed 
graph can be constructed in which the vertices correspond to the intermediates and 
the free enzyme, with edges derived from the reaction mechanism. The labels can be 
chosen so that the differential equations of the linear Laplacian dynamics coincide 
with the full nonlinear ODEs. Since free substrate and free product are distinct from 
the intermediate complexes and the enzyme, the uncoupling condition is readily sat- 
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isfied. Because intermediates eventually break up to release enzyme, the graphs are 
naturally strongly connected. The MTT and formula ([2]) can then be used to elim- 
inate the intermediates and the free enzyme in favour of substrates and products, 
from which the enzymatic rate function can be calculated (Figure |4] and Appendix). 

In the biochemical literature, such calculations are done by the King-Altman 
procedure, HHHH, which is a restatement of the MTT. King-Altman has been widely 
used to calculate rate functions for complex enzymatic mechanisms with multiple 
ligands, affectors and intermediates, EHHl. The linear framework both encom- 
passes this and shows how it can be integrated into the analysis of multi-enzyme 
systems, as described next. 

Post-translational modification (PTM) 

Many proteins are covalently modified by the attachment of small chemical or pep- 
tide moieties, such as phosphate or ubiquitin, to specific residues, ll4Tll . PTM may 
involve multiple types of modifiers on multiple sites. Different global patterns of 
modification, or "modforms", may have different downstream effects, while the dis- 
tribution of modforms is dynamically regulated by forward modifying and reverse 
demodifying enzymes acting in opposition, [[3T1l . (Figure [5]). PTM is believed to im- 
plement adaptive cellular information processing on physiological time scales, as, 
for instance, in "PTM codes", [[T5l[39l. The linear framework enables quantitative 
analysis despite the resulting dynamical and combinatorial complexity [38, 37 J. 

Consider a single substrate, S, that supports multiple types of modification at 
multiple sites by multiple forward and reverse enzymes. Combinatorial explosion 
may lead to enormous numbers of modforms, depending on the numbers of sites 
and types of modification. A directed graph can be formed in which the vertices are 
the modforms and there is an edge between two modforms if there is some enzyme 
(there may be several) that catalyses the corresponding change in modification state. 
It is typically the case that any modification can be eventually undone by some other 
enzyme, so this modform graph is naturally strongly connected. 

The labels emerge from a separation of time scales. The donor molecules, 
such as ATP in the case of phosphorylation, and their breakdown products, such 
as ADP and phosphate, are assumed to be kept at constant concentration over the 
time scale of the modification dynamics by cellular processes that are not explicitly 
modelled. The modifier species can then be ignored as dynamical variables and 
enzyme reaction schemes can be simplified to involve only formation and break- 
down of intermediate complexes and conversion between intermediate complexes 
(Figure [5]). Realistic enzyme mechanisms may be assumed that vary for different 
substrate modforms. The mechanisms can be analysed using the linear framework, 
as explained in the previous section, yielding expressions from which the labels for 
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the modform graph can be assembled (Figure [5} caption). The uncouphng condition 
becomes restrictive here, since it requires that no substrate is also a modifying or de- 
modifying enzyme. The differential equations arising from the Laplacian dynamics 
then recapitulate the full nonlinear ODEs. 

Because the modform graph is strongly connected, the MTT can be applied to 
eliminate the modforms in favour of the enzymes. This is hierarchical elimination: 
the intermediates are first eliminated in favour of the modforms and the enzymes; 
the modforms are then eliminated in favour of the enzymes. We deduce that, despite 
the overwhelming combinatorial complexity arising from multisite modifications, 
the number of algebraically independent quantities at steady state is just the number 
of enzymes. This is usually very much smaller than the number of modforms. 
All other steady state concentrations are rational expressions in the free enzyme 
concentrations, with the expressions coming from ([2]). As for the enzymes, the 
total amount of each enzyme is conserved, which gives sufficiently many algebraic 
equations for the free enzyme values to be determined. 

We see that the steady states of a PTM system can be calculated algebraically, 
without the need for numerical simulation, and without prior knowledge of any 
parameter values. This may be done irrespective of the number of modifications, the 
number of modification sites and the complex details of the enzyme mechanisms. 

This capability has yielded several insights, [I38ll24||. For instance, it has iden- 
tified the first biochemical mechanism capable of implementing a "PTM code" and 
shown that its information capacity is potentially unlimited, ll38ll . 



Synthesis and degradation 

The linear framework also provides a foundation for new types of analysis. An 
aspect of the applications above is that synthesis and degradation were ignored. 
This is tantamount to another assumption of time-scale separation, since cellular 
components are always being turned over. We now analyse what happens when this 
assumption is dropped. 

Consider, as before, a labelled, directed graph, G, on vertices 1, • • • , n. Allow 
each vertex, i, to have a partial labelled edge leading in, z, and out, i % , 
corresponding to zero-order synthesis or first-order degradation of i, respectively 
(Figure [6}\). By allowing = or rf^ = 0, each vertex may have any combi- 
nation of synthesis and degradation, including neither or both. The degradation 
label di has the usual units of (time)"^ but the synthesis label must have units of 
(concentration) (time)" ^. Call this "partial graph" G+. As before, there is a linear 
dynamics on G+, which may be described by the system of differential equations 

— = C{G).x-/^.x + A. (7) 
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Here, A is a diagonal matrix with A^^ = di and ^4 is a column vector with Ai = ai. 
Note that, unlike ([T]), the equations in (|7]) are non-homogeneous: if x is a steady 
state of ([t]), it does not follow that Xx is also a steady state. Because 1^.C{G) = 0, 
if X is a steady state of (|7]), then 

diXi H h dnXn = ai H \- (8) 

which reflects the fact that synthesis and degradation must be in overall balance. 

When there is neither synthesis nor degradation, a general graph may have sev- 
eral degrees of freedom at steady state, reflected in the size of the basis in ([3]). These 
free quantities are ultimately determined by the initial conditions. With synthesis 
and degradation, some of these degree of freedom may be lost, as the total amount 
of matter is no longer conserved. This is reflected in the loss of homogeneity in (|7]). 
The system may not reach a steady state unless synthesis and degradation can find 
a balance. 

Construct a new labelled, directed graph G* by adding a vertex * to G (Fig- 
ure [6^). For each partial edge i with > or i with di > 0, introduce the 
edges * i or i ^ * in G*, respectively. Unlike G* is a directed graph with 
positive labels, whose Laplacian dynamics are governed by ([!]). It is easy to see that 
(xi, • • • , Xn) is a steady state of G+ if, and only, (xi, • • • , x^, 1) is a steady state of 
G*. The condition for vertex * to be at steady state in G* corresponds exactly to 
equation ^ for synthesis and degradation to be in balance in G+. 

This enables a complete description of the steady states of G+ but we focus here 
on the case that is most relevant to the applications. If G* is strongly connected, so 
that the MTT gives p as a basis element for the kernel of >C(G*), then G+ has a 
unique steady state x for which 

x^ = ^. (9) 

The single degree of freedom in G* has been used in ([9]) to ensure that = 1. 
Notice that G* may be strongly connected even though G itself is not (Figure |6]), so 
that ([9]) applies to a broader class of graphs than does the MTT itself. 

Equation ([9]) may be used to revisit the applications above to understand the im- 
pact of synthesis and degradation. It also opens up for analysis a broad range of new 
biological contexts. For instance, regulated degradation is a frequently used mech- 
anism in several signal transduction pathways, such as the Wnt/beta-catenin and 
death-receptor pathways, [[2T1I291, which also make abundant use of reversible lig- 
and binding and post-translational modification. Analysis of these using the linear 
framework is work in progress. 
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Conclusions 



Time-scale separation, leading to elimination of internal complexity, has been a 
fundamental method for analysing biochemical systems, from the earliest days of 
single-enzyme biochemistry through molecular biology to modern studies of multi- 
component systems. The framework shows that these calculations, which were 
previously considered distinct, are, in fact, the same. Moreover, they are all linear. 
The linearity hinges on the uncoupling condition, which allows nonlinearity in the 
dynamical variables to be traded for algebraic complexity in the labels. The fact that 
uncoupling is feasible in so many different contexts indicates a remarkable degree 
of linearity concealed within nonlinear biochemistry, a surprising insight that is 
amplified by the results of CRNT in Figure [2| The framework brings systematic 
techniques, clarity and pedagogical coherence to the field and lays a foundation for 
developing new methods of analysis. 

One intriguing direction to explore is the extension of the framework from the 
steady state to the dynamics. The problem of whether time-scale separation yields 
a good approximation of the dynamics can be studied by the method of singular 
perturbation, [11 J. However, this has only been undertaken for a limited number 
of biological examples. The framework provides the means to formulate such an 
analysis in a far more general way. 

In contrast to simulations, for which all details most be specified in advance, the 
framework yields results that hold irrespective of the underlying molecular com- 
plexity. It is, therefore, well suited for distilling biological principles without be- 
coming mired in the molecular details, a much needed facility for modern biology. 

APPENDIX 

Laplacian matrices and the MTT 

Matrices similar to the Laplacian in equation ([T]) were first introduced for unla- 
belled, undirected graphs by Gustav Kirchhoff in his 1847 paper, [191 , whose title, 
in English translation, ''On the solution of the equations obtained from the inves- 
tigation of the linear distribution of galvanic currents'', suggests its origins in his 
well-known studies of electrical circuits. In this form, the Laplacian may be seen as 
a discrete version of the continuous Laplacian operator but the same name is used 
for different versions and normalisations, [5J. The concept of a spanning tree and a 
result similar to the Matrix Tree Theorem also make their appearance in Kirchhoff 's 
paper. This seems to be the first of many subsequent Matrix Tree Theorems; see [[27l 
Chapter 5] for historical references. Several deep properties of graphs emerge from 
the spectral theory of Laplacian matrices [5J. Bill Tutte, one of the founders of mod- 
ern graph theory, extended the concepts to directed graphs and proved the version 
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of the MTT used here, [H. 



Kernel of the Laplacian for a general graph 

We sketch a proof of equation ([3]) which gives a basis for the kernel of the Laplacian. 
While the essential ideas are introduced we leave it to the reader to fill in some of 
the details. Let G be an arbitrary labelled, directed graph on the vertices, 1, • • • , n. 
As always, we assume that G has no self loops. Choose x G k.ei C{G). Let G be 
the acyclic directed graph on the strongly connected components (SCCs) of G, as 
in Figure [?]□. Suppose that the vertices of G are ci, • • • , and that ci is an initial 
sec that is not also terminal. By construction, there must be some vertex, zi G ci, 
with an edge leaving ci, fc, where k ^ ci. If > 0, there is a positive flux 
of material along this edge. For x to be a steady state, this flux must be balanced 
by some flux coming into ii. This can only arise from some edge i2 h with 
Xi^ > 0. Taking all such vertices, recursively, yields a subset of vertices that can be 
the only source of the balancing flux into ii. However, because ii is an initial SCC, 
this subset is entirely contained in ci. Since this SCC has only a limited amount 
of material, it cannot indefinitely balance the outgoing flux on the edge ii ^ k. li 
follows that Xi^ < 0. However, if Xi^ < then there is positive flux coming into ii 
along the edge ii ^ k. This can only be balanced by an edge is ^ H with Xi^ < 0. 
Arguing recursively in a similar way as above yields a similar contradiction. We 
conclude that Xi^ = 0. But then xj = for any vertex j with j ^ ii. Since ci is 
strongly connected, it is then easy to see that xj = for any j G ci. It follows that 
X has no support on any initial SCC that is not also terminal. (The support of x is 
the subset of vertices, i, such that Xi ^ 0.) 

It is now easy to argue by induction over those SCCs that are not terminal to 
show that the support of x contains only vertices that are in terminal SCCs. Consider 
each terminal SCC, t, as a labelled, directed graph, G^, in its own right, in isolation 
from the rest of G. Assume that Gt has Ui vertices. Let x^ G M^* be the vector 
obtained from x by restricting x to those vertices lying in t. Since x has no support 
outside the terminal SCCs and there are no edges between the terminal SCCs, it 
should be clear that x^ G ker£(Gt). Let G M^* is the vector coming from the 
MTT applied to G^ . Since t is strongly connected and dimkeri2(Gt) = 1, it must 
be that x^ = A^^^ for some G M. Now let G be the vector constructed for 
equation ([3]), 

, _ f {v% ifzGt 
\ otherwise. 

Since the terminal SCCs are disjoint, the vectors, , - - - , p^, are linearly indepen- 
dent by construction. Evidently, x = Y^=\ ^ p^ - Hence, these vectors form a basis 
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for the kernel of the Laplacian, 



ker£(G') = (p\---,p^), 

which proves equation ([3]). 

Ligand binding at thermodynamic equilibrium 

Consider the labelled, directed graph, G, arising from the binding of multiple lig- 
ands to multiple sites on a substrate that may exist in multiple states, as discussed in 
the paper. The microstates are assumed to be encoded in some way, as in Figure |3| 
and are enumerated simply as 1, • • • , n. Edges correspond either to changes in state 
of the substrate, with ligand binding unaltered, or to ligand binding or unbinding, 
with substrate state unaltered. Assuming that the system can reach thermodynamic 
equilibrium, each edge is reversible and edges can therefore be treated in pairs, 

• ""tj . ■ %j ■ 
J J ^ 

A ligand binding edge is assumed to have a label, a^j = k^j [L^], where k^j is a rate 
constant and [L^] is the concentration of one of the ligands, treated either at steady 
state or as slowly varying. For all other edges, the label is a rate constant. 

If X is a steady state of G — in other words, if x G ker£(G) — then x satisfies 
DB if each reversible edge is independently at kinetic equilibrium. In other words, 
whenever there is a reversible edge, the forward and reverse fluxes are balanced. 

The cycle condition on G states that, for any cycle of reversible edges, the product 
of the rate constants on the edges going clockwise is equal to the product of the 
rate constants on the edges going counterclockwise. We want to show that the cycle 
condition holds on G if, and only, if every steady state satisfies DB. 

Suppose first that x satisfies DB. Since the net flux through any reversible edge 
is zero, the net flux around any cycle of reversible edges is also zero. We know from 



([T0|) that 

Xj Ki^jXi , (11) 

where Ki^j = al^/a^j. Choose any cycle of reversible edges and pick any two 
vertices on it, say and /. The cycle can be broken into a pair of directed paths 
from to /. Applying ( [T7] ) repeatedly on each path gives two expressions for 
Xjf in terms of x^/. Equating these expressions, canceUing ligand concentrations 
and clearing denominators, yields the cycle condition. Since the cycle was chosen 
arbitrarily, this proves the first part. 
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Now suppose the cycle condition holds. Let x be any steady state. We need to 
show that X satisfies DB. We construct an alternative steady state y, which we show 
to satisfy DB, and then prove that y = x. Assume that the reference microstate, 1, 
has no ligands bound, and set yi = xi. For any other microstate j, choose some 



path of reversible edges from 1 to j and use ( 1 1 ) to express yj in terms of yi. Now 
choose some other path from 1 to j and obtain a second expression for yj in terms 
of yi. The two paths together form a cycle of reversible edges, to which the cycle 
condition applies. Reorganising the cycle condition and putting in the appropriate 
ligand concentrations shows that the two path expressions give the same result for 
yj. Hence, this quantity is well defined, irrespective of the path chosen. 

We have unambiguously defined a state, y, of G but we have yet to show that 
it is a steady state. Consider any reversible edge between the microstates i and j. 
Choose a pair of reversible paths from 1 to i and from 1 to j. Together with the 
reversible edge between i and j, this gives a cycle of reversible edges. Applying the 
cycle condition, it is easy to see that, in the state y, the reversible edge between i and 
j must be in kinetic equilibrium. This not only implies that y is a steady state but 
also that y satisfies DB. But now, G is strongly connected and so dim ker JC{G) = 1. 
Hence, y = Xx for some A G M. Since yi = xi, X = 1. Hence, y = x and therefore 
X satisfies DB. This completes the proof. 

If the reference vertex, 1, has no ligands bound, then, in any steady state x, the 
quantity Xi/xi is a monomial in the ligand concentrations and the power to which 
[Lu] appears is the number of molecules bound in microstate i. Hence, the 
concentration of states in which is bound is given by 

[Lu\{dxtot/d[Lu]) 

and the "fractional saturation", or average concentration of states bound by L^, is 
the logarithmic derivative, 

\xtot J o[Lu\ 

More complex aggregate concentrations can be worked out in a similar way. 

The calculation of Xtot can be simplified by suitably decomposing the graph, as 
illustrated by the sum and product formulae below. 

DB implies that any steady state x^ of G gives, by restriction, a steady state 
x^ of any subgraph, R. If R and T are subgraphs that are disjoint (no vertex in 
common), which together span G, we get the sum formula 

{x%ot = {x%ot + {x^)tot . (13) 

If ligands bind independently, so that the site-specific rate constants are indepen- 
dent of the microstate in which ligand binds, then the graph may be decomposed 
into a product of the graphs for single site binding. The product of two graphs is 
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defined as follows. Suppose that G is a labelled, directed graph on the vertices 
gi,' ' ' .Qn and that i7 is a labelled, directed graph on the vertices /ii, • • • , /i^. The 
product G X H is the labelled, directed graph on the vertices gi x hj in which there 
is an edge 

gii X hj, A gi^ X hj^ 
whenever there is an edge gi^ A gi^ in G and, symmetrically, there is an edge 

whenever there is an edge hj^ A hj^ in H, There are no edges inG x H other than 
these. This construction captures the fact that a change in state of either factor is 
independent of the state of the other factor. 

The steady state of a product may be obtained from those of its factors as fol- 
lows. Define the normalised total steady state by 7r{G) = Xtot/xi, where x is any 
steady state. It follows from equation [2] that 7r(G) is independent of x, although it 
may depend on the choice of reference vertex. With 1 x 1 as the reference inGx H, 
it is not difficult to prove the product formula, 

7r{G xH) = 7r{G) x 7r{H) . (14) 

Independent binding allows 7r{G) to be factorised. 

Formulae ( [T2| ), ([13]) and ( [T4| ) are helpful for the typical calculations arising in 



studies of gene regulation or protein allostery. 



Enzyme kinetics 

The details of the calculation of the enzymatic rate formula in Figure |4] are shown 
in Figure |7j The rate of product formation is given by 

^ = kUAYp] - K+iiPm . (15) 

Using the ordering in Figure [7| in which vertex p+1 corresponds to the elimina- 
tion formula in equation g gives [Yp] = {pp/ ptot)Etot and [E] = {pp+i/ ptot)Etot' 
Hence, 

^ = iK^iP, - KAP]Pv+i) . (16) 

The spanning trees of an isolated cycle are easily enumerated (Figure |7]C) and the 
MTT shows that p^+i = 7 and pi = ai[S] + l5i[P], for i < p, where 7, a^, /3i are 
polynomials in the rate constants. Hence, 

Ptot = 7 + f E [S] + [P] . (17) 



Vi=l / \i=l 
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Comparing the spanning trees for vertices p andp+1 reveals substantial cancellation 
when calculating the pre-f actor in ( [T6| ) (Figure |7]C). This simplifies to the difference 
between the product of the labels going clockwise around the cycle and the product 
of the labels going counterclockwise, 



" V ' V ' 

labels on CW edges labels on CCW edges 



(18) 



Note that the term on the right in ( [15] ) is the steady-state net flux around the isolated 
cycle in Figure [7^. When this is zero, ( [T8| ) shows that the product of the clockwise 
labels equals the product of the counterclockwise labels. This gives another proof 
of the cycle condition, discussed in ^ which holds at thermodynamic equilibrium. 
Combining ([TT]) and ([T8]) and normalising appropriately yields the rate formula 



dt 



Vs 



(m - VP (m 



tot 



(19) 



where vg, vp. Kg, Kp are rational expressions in the rate constants. This is the 
reversible Michaelis-Menten formula [6J. Note that this has the same form irre- 
spective of the number of intermediates. 
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labelled, directed graph, G 



mass-action 



kinetics 



dt 



(^1 



Laplacian matrix of G 



-(/.2 + %) 



k^ J\ X3 



B 



spanning trees rooted at each vertex 



strongly connected graph 



Laplacian 

-(k2^ks) ki 

k2 -ki k^ 

k3 -/v4 



= E n H 

Qi(G) — {spanning trees rooted at /[ 



^0 



kernel of the Laplacian 



A" 2 ^'4 + k^k^ 
\ k-lk-s 



directed graph, G 



acyclic, directed graph, G, on SCCs 
terminal SCC 



initial SCC 



Figure 1 : The linear framework. (A) A labelled, directed graph, G, gives rise to a 
system of linear differential equations by treating each edge as a first-order chemi- 
cal reaction under mass-action kinetics, with the label as rate constant. The corre- 
sponding matrix is the Laplacian of G. (B) In a strongly connected graph (note the 
difference to the one in A), there are spanning trees rooted at each vertex, the roots 
being circled. The MTT gives an element of ker C{G) according to the formula in 
the box, as explained in the text. For more examples of spanning trees see Figures 1 
and 2 of the Appendix. (C) In a general directed graph, G, two distinct vertices are 
in the same strongly connected component (SCC) if each can be reached from the 
other by a path of directed edges. The SCCs form a directed graph, G, in which 
two SCCs are linked by a directed edge if some vertex of the first SCC has an edge 
to some vertex of the second SCC. G has no directed cycles, allowing initial and 
terminal SCCs to be identified. 
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E + Sq 4 ► ESo ► E + Si 
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F + Si 4 ► FSi ► F + So 



Figure 2: Chemical Reaction Network Theory. A reaction network, A^, is shown 
(bottom right) for a substrate S existing in two states of modification, and Si, 
which are inter-converted by enzymes E and F. Each enzyme uses the classi- 
cal Michaelis-Menten reaction mechanism, with enzyme- substrate complexes, ES^ 
and FSi. Mass-action kinetics gives a system of nonlinear differential equations, 
dc/dt = /(c), where q is the concentration of species i. The component functions 
/i(c), • • • , /6(c) are listed. The network gives rise to the labelled, directed graph 
Gn on complexes (top right). The nonlinear function / may be decomposed into 
the linear Laplacian, C{Gn), as defined in Figure [T|\, and two linking functions, 
a linear function Y and a nonlinear function Formally, if ui denotes the mul- 
tiplicity of species i in complex u, then, for c G M^, ^{c)u is the corresponding 
mass-action expression, ^{c)u = • • • and if z = (0, • • • , 1, • • • , 0) is the ba- 
sis element of corresponding to complex u, then Y{z) is the list of multiplicities 
in z, Y[z)i = Ui, With these definitions, the diagram in the centre commutes: 
/(c) = YH{G^)^. 
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Figure 3: Reversible ligand binding. (A) Gene regulation, [[Tl[33l[3l[l7]|. Transcrip- 
tion factors may oligomerise before binding to DNA and initiating gene transcrip- 
tion at rates that depend on the pattern of ligand binding. A labelled, directed graph 
can be constructed as described in the text. Separating time scales as shown, the 
overall transcription rate as a function of oligomerised transcription factor concen- 
trations (the gene regulation function), is the average rate, weighted by the probabil- 
ity of the promoter having the corresponding pattern of ligand binding. Probabilities 
are ratios xoo/^tot, • • • , xn/xtot in any steady state x of the Laplacian dynamics on 
the graph. In this example, reactions are asumed to take place at thermodynamic 
equilibrium, without dissipative changes, such as nucleosome reorganisation. (B) 
Protein allostery, [[26l|28l. An allosteric dimer is shown in two quaternary states, 
relaxed and tense. Ligand can bind to each monomer on a fast time-scale com- 
pared to catalytic activity of the protein. The labelled, directed graph has both 
quaternary state changes (relaxed to tense and vice versa) and ligand binding and 
unbinding, with the corresponding reactions being assumed to take place at themo- 
dynamic equilibrium. Labels have been omitted for clarity. For allosteric enzymes, 
the overall rate is assumed to be proportional to the fraction of sites that are bound 
by ligand (the fractional saturation), which can be directly calculated as described 
in the Appendix. 
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reaction meclianism of enzyme E 

k 1 k 2 k p k p+1 

S + E Yi . . . 4==^ Yp ^ ► E + P 

▲ ki k2 kp k p^f I 

slow fast ► slow 



labelled, directed graph 

^ k 2 k p 

^1 ^ ► . . . ^ ► p 

k%[S] \\ 

• 

enzyme rate function 

Figure 4: Enzyme kinetics. A reaction mechanism is shown for enzyme E convert- 
ing substrate S into product P via the intermediate complexes Fi, • • • , 1^. With the 
indicated separation of time scales, a labelled, directed graph can be contructed with 
vertices 1, • • • corresponding to Fi, • • • , 1^, respectively, and an additional vertex 
p+ 1, corresponding to the free enzyme, E. The two edges leading out of p+ 1, rep- 
resenting the formation of intermediate complexes, acquire algebraic expressions as 
labels, while all other edges have the corresponding rate constants. With this choice 
of labels the linear Laplacian dynamics on the graph recapitulates the full nonlinear 
dynamics of the reactions. The MTT can be used to be used to calculate the rate 
of product formation, dP/dt = — k~^i[P][E], as explained in the SOM, 

leading to the reversible Michaelis-Menten formula, 
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Figure 5: Post-translational modification. A hypothetical example is shown with 
eight modforms of a substrate with two phosphorylation sites and one methyla- 
tion site, acted on by kinases Kl and K2, methyltransferase M, phosphatase F 
and demethylase D. The system is coupled upstream to core metabolism that re- 
news the donor molecules (for methylation, SAM is S-adenosyl methionine, SAH 
is S-adenosyl homocysteine and CH2O is formaldehyde) and downstream to the bi- 
ological processes influenced by PTM . Assuming time-scale separations as shown, 
the PTM system gives rise to a directed graph on the modforms, one edge of which, 
from 000 to 010, is highlighted in the box. This edge is catalysed by Kl and K2 
through the individual reaction mechanisms shown, in which the modifier species 
can be ignored because of the time-scale separation. Kl acts sequentially, produc- 
ing only 010 from 000; K2 can produce both 010 and 100 in a random, distributive 
manner as well as the doubly phosphorylated 110 processively. Also shown are 
the corresponding graphs on the intermediate complexes, constructed as in Fig- 
ure |4} with labels omitted for clarity. Using the MTT yields expressions whose 
coefficients, aooo, <^oio, &110. may be regarded as the reciprocals of generalised 
Michaelis-Menten constants. The appropriate label on the edge 000 010 can 
then be assembled as a linear combination of the steady-state concentrations of Kl 
and K2, with coefficients that are generalised catalytic efficiencies. See ll37ll for 
further details. 
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Figure 6: Synthesis and degradation. (A) The non-strongly connected graph in 
Figure [T|\ is augmented with partial edges denoting synthesis and degradation to 
form the partial graph G+. For clarity, only those partial edges with non-zero labels 
are shown. Under mass-action kinetics, gives rise to a non-homogenous system 
of linear ODEs. (B) By introducing a new vertex, *, the labelled, directed graph, G*, 
can be formed, which, in this case, is strongly connected, with the corresponding 
Laplacian. Using the MTT to calculate p G keri2(G*), as shown (the spanning 
trees are enumerated in Figure 2 of the SOM), the unique steady state of can be 
calculated as = p^/p*, for 1 < i < 3. 
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Figure 7: Enzyme kinetics. (A) The reaction mechanism from Figure |4} (B) The 
corresponding labelled, directed graph. Note that its orientation is different from 
that shown in Figure |4| Note that only the two outgoing edges from vertex p+1 
have algebraic expressions for labels. (C). Enumeration of the spanning trees rooted 
at vertex p (top), corresponding to 1^, and vertex p+1 (bottom), corresponding to 
E. The spanning trees for any root may be constructed by choosing a gap between 
adjacent vertices, with the edges running in opposite directions to the root on either 
side of the gap. Using the labels in B and the MTT formula from Figure [T^, it 
can be checked that only p^+i has no occurrences of [S] or [P], while each p^, for 
i ^ p + 1, has either one [S] or one [P] but not both. This proves equation (17). 
Considering the depicted trees in vertical pairs, it can similarly be checked that the 
pre-factor in equation ( [T6| ) reduces to the two terms coming from the last pair of 
trees, giving the formula in equation (fTSl). 
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Figure 8: Spanning trees for the labelled, directed graph in Figure [6^. The 19 trees 
are listed, with each root indicated by a black circle around the corresponding ver- 
tex. The formulas for pi, p2, P3, P* in Figure [6^ can be read off according to the 
MTT formula in Figure [Tp. 
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